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Abstract 

Our main purpose is to describe a very efficient MPP 
algorithm for performing one important class of Ising 
spin simulations. Results and physical significance of 
MPP calculations using the method described here 
will be discussed elsewhere. However, we will make a 
few comments on the problem under study and report 
briefly on results so far. Ted Einstein has provided us 
with much guidance in interpreting our initial results 
and in suggesting calculations to perform. 

1 Introduction 

Ising spin simulations occur in many areas of physics 
and have attracted the attention of researchers since 
the earliest days of electronic computation (Ref. 4). 
They provide a very good example of how, from the 
point of view of algorithm design, two apparently dis- 
parate problem types can be attacked by almost the 
same techniques. Some classes of Ising spin calcu- 
lations can require speeds far in excess of anything 
currently available. 

The basic model is easily described. The spin vari- 
able <r(i) is specified at the nodes of a uniform grid 
in two (or three) dimensions. At each grid site the 
spin can take on only the values -hi and -1. Spins 
are related to one another via the energy expression 
given by the Hamiltonian 

H = - Y, *(*>(/) 

where {i, j} ranges over all nearest neighbor pairs of 
sites. In an i x i square lattice, at site i we have the 
local energy expression 

E(i) = -a(i) X (<r(i -h 1) -h a(i — 1) 

+a(i + £) + a(i - £)). 


In most cases periodic boundary conditions are im- 
posed, so that i + 1 and t + i are to be determined 
mod L 

One wishes to compute various averages with re- 
spect to the probability P(C) for a configuration of 
spins, C, to occur. The “classical” algorithm for using 
Monte Carlo methods to sample configuration space 
is due to Metropolis, Rosenbluth, Rosenbluth, Teller 
and Teller (Ref. 4), (called the M(RT) 2 algorithm 
for short). It consists of a series of moves through 
configuration space, making use of the fact that for 
each C, P(C) oc exp(-JE(C)/kT). Here E(C) is the 
energy associated with configuration C, J is the cou- 
pling constant for the problem under study, T is the 
temperature and k is Boltzmann’s constant. A site i 
is chosen at random and the change in energy AFjt) 
which would result in reversing the spin at that site 
is determined. Since only the site i and its four near- 
est neighbors are involved, it is easy to see that the 
change in energy is 

AE(i) = (E'(i) - £({)) 

+(£'(* + l)-F(i + l)) 

+(£'(* + *)-£?(» + /)) 

+(£'(* 

= —4E(i) 

If A E(i) < 0, then the move is “accepted” and the 
sign of cr(i) is reversed. In case AF(i) > 0, the move 
is accepted with probability exp(— JAE/kT). 

It can be shown that the M(RT) 2 algorithm de- 
fines a Markov process which samples the “correct” 
(Boltzmann) distribution of configuration of spins. 
However, for a large system several hundreds of thou- 
sands, or even millions of updates of each site must be 
performed in order to approach a single equilibrium 
configuration, and often averages over many such con- 
figurations are required. Most of the work is in gen- 
erating the random numbers, since as many as 10 12 
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moves of sites may be required in a single simulation. 
We will delay discussion of this critical aspect for the 
moment, and concentrate on how to modify M(RT) 2 
in order to get an algorithm well suited to the MPP. 
Later we explain why this algorithm is also well suited 
to vector machines, and we then describe one method 
for vectorizing the generation and testing of random 
numbers. 

2 Algorithm 

The MPP consists of a square array of 128x128 single- 
bit processors, having an SIMD architecture. Proces- 
sors are interconnected via nearest neighbor bonds 
and periodic boundary conditions is one possible con- 
nection pattern for border processors. Because of 
its SIMD architecture, the MPP is extremely well 
adapted to image processing applications and, in fact, 
one point we hope to make is that, so far as algorithms 
are concerned, Ising spin simulations are a type of im- 
age processing problem. 

One is tempted to think of associating processors 
with spin sites in a one-one manner. However, this 
does not make very good use of the machine, since in 
the M(RT) 2 algorithm, only one site is examined for 
each move. Instead of mapping sites to processors in 
the obvious way, we instead notice that the expres- 
sions for the energy associated with a site are similar 
in form to the central difference approximation used 
to solve the Poisson equation, 

~V 2 u = p. 

In the Poisson case, a grid site and its four nearest 
neighbors are related through the finite difference ex- 
pression 

~u(i — t) - u(i — l) 

-Mu(i') 

— u(i + 1) - u(i 1) — (Ax)(Ay) * p(i). 

A common strategy used in implementing iterative 
methods for solving the Poisson equation is to use 
the “red-black ordering” depicted below: 

R B R B R B 

B R B R B R 

R B R B R B 

B R B R B R 

Since no pair of red sites are nearest neighbors, all red 
sites can be updated “simultaneously”. These values 


can then be used to update the black sites, and so- 
forth, alternating on each iteration between red and 
black sites. 

The same idea can be applied to modify M(RT) 2 . 
Spin flips can be attempted at all red sites or at all 
black sites. This is a different Markov process than 
M(RT) 2 but the same distribution is sampled. Of 
course it is the dynamical aspects which are of inter- 
est now, rather than merely the converged solution. 
This use of colors is part of the so-called multi-spin 
algorithm (Ref. 5). In order to implement multi-spin 
on the MPP, all sites of a single color can be associ- 
ated with a single MPP plane of 128x128 processors. 

The problem to which we have applied multi-spin is 
slightly more complicated than that of attempting to 
flip single sites (Glauber dynamics). Instead we want 
to try to exchange the spins of a pair of neighboring 
sites. These spin exchanges or Kawasaki dynamics 
calculations can be used to study phenomena such as 
growth of magnetic domains (Ref. 2). 

Because spin exchanges are to be attempted, eight 
different sites are involved in each move. For exam- 
ple to interchange sites k x and k 2 we have all of the 
depicted bonds to consider. The change in energy is 



the sum of changes over all eight sites and is given by 
the expression 

A E = ~2(a(k 2 ) - ^(fci)) 

* (cr(ti) + <r{i 2 ) + <r(i s ) 

-ff(ji) - <r{h) - °{js)} ■ 

In order to apply the multi-spin idea enough “col- 
ors” must be assigned so that no two sites of the same 
“color” are involved in the same move. This can be 
accomplished by partitioning sites into sixteen groups 
as is shown below: 
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Each group is now associated with a single 128x128 

MPP bit plane, so that a 512x512 simulation can be 
handled in a straight-forward way. 

Notice now that the part of the expression for 
A E which is enclosed in brackets can take on only 
seven distinct values, namely {-6, -4, -2, 0, 2, 4, 6}. 
The MPP array consists of single-bit processors and 
Boolean operations can be done simultaneously in a 
single operation on all 2 14 processors. The compu- 
tation of AE can be reduced to Boolean operations 
by first associating the spin values a ={ + 1,-1} with 
Boolean values = {1, 0} (which in effect makes a tran- 
scription to a lattice gas model), and next noticing 
that the value of the bracket part of AE corresponds 
to the sum of the 1 bits of the Boolean expression. For 
example: 

Spin version: 

{11 - 1 1 1 1 } 

Value = 4. 

Boolean version: 

{110 111 } 

Sum of bits = 5 


The complete correspondence is as follows: 


Spin Sum 

Bit Sum 

Representation 

6 

6 

110 

4 

5 

101 

2 

4 

100 

0 

s 

Oil 

-2 

2 

010 

-4 

1 

001 

-6 

0 

000 


The bit representation for the sum of bits can it- 
self be determined by defining summation of Boolean 
expressions by Boolean operations. Assume that 
BITS(I), I = 1,2,..., 6 is an array of bit planes each 
consisting of all 128x128 sites of a given color. The 
bit representation of the sum consists of three more 
bit planes Bl, B2, and B3, initially all zero. Addition 
is performed bit by bit with the proper rules for car- 
ries. The correct representation for the final values is 
obtained by executing the following loop. 


DO 10 1=1,6 

Bl = Bl .OR. (BITS (I) 

.AND. B3 .AND. B2) 

B2 = (.NOT. (BITS(I) 

.AND. B3) .XOR. (.NOT. B2)) 

B3 = (.NOT. BITS (I)) 

.XOR. (.NOT. B3) 

10 CONTINUE 

The sign of the sum of neighboring spins is deter- 
mined by the value of the high order bit Bl. This 
must be combined with the value of - 2 (cr(A 2 ) —a(Ai )) 
to determine when a random number needs to be com- 
pared with exp (— J AE/kT). Of course, the exponen- 
tial also takes only finitely many values, so that it is 
easy to parallelize the comparison step. The genera- 
tion of random numbers can be done simultaneously 
if a method which allows more than 2 14 seeds is used. 
We have adapted a program due to Marsaglia and 
Kahaner (Ref. 3), but other methods are possible. 
In any case, 2 14 pairs of sites are handled in a sin- 
gle step. The total number different types of nearest 
neighbor pairs ( < 1,2 >,< 6, 7 >,< 9, 5 >, etc.) is 
32 , and one should not cycle through these types in a 
fixed pattern because this marching introduces false 
dynamics into the simulation. 

3 Vector Machines 

Essentially the same algorithm can be used on a vec- 
tor machine such as the CYBER 205. The repre- 
sentations for A E can be computed very efficiently 
using bit vector operations. However, vector instruc- 
tions are not really the same as parallel instructions, 
and so the 2 14 random numbers which are required 
for each step take a lot of time to generate on a vec- 
tor machine. The way to alleviate this is to reduce 
even the comparison with random numbers to vector 
operations on bit arrays. Assume, for convenience, 
that the value of — 2 (<t(A: 2 ) — <r(£i)) is -4, and let 
a = exp ( — 8 J/kT). As in (Ref. 1), we create bit 
arrays D1 DO of length 2 14 where DlDO = 11 with 
probability a 3 ; DIDO = 10 with probability a 2 - a 3 , 
DlDO = 01 with probability a — a 2 and DlDO = 00 
with probability 1 — a. This is easily done by generat- 
ing 2 14 random numbers and noting where they fall in 
the intervals 13 = [0,a s ), 12 = [a 3 , a 2 ), II = [a 2 , a) 
and 10 = [a, 1]. The test can now be performed by 
computing the bits of (Bl B2 B3 + Dl DO) 0 001. 
Here the operation 0 means addition modified so that 
the high-order bit is not “turned off” by a carry op- 
eration (e.g. Ill 0 001 = 100). After this operation 
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has been performed, exchanges can be made at all 
sites with high-order bit equal to 1. 

A little reflection reveals that the above method 
performs exchanges with nearly the correct frequency, 
because the bit vectors DlDO approximate the cor- 
rect probability distribution. After DlDO have been 
used for a step, the vectors must be permuted in some 
random fashion, and to insure that the false period- 
icities are not introduced, the DlDO vector should 
occasionally be re-loaded by generating 2 14 new ran- 
dom numbers. This, of course, is much more efficient 
than generating a full set of random numbers for each 
step. 
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4 Preliminary Results 

The first program to implement the algorithm on 
the MPP was written in Parallel Pascal by Julie 
O’Connell. However, signigicant portions of it were 
later recoded in PEARL by Jim Abeles. The result is 
a code which performs better than 200 million tests 
of spin sites per second. Since our goal is to study 
the long time growth of domains, very long simulation 
times and averages over many different configurations 
are required. The present code makes this a practical 
possibility. 

As has been mentioned, in the case of spin ex- 
changes great care must be taken to avoid introducing 
any false periodicities into the results. This need for 
care is especially acute in our study because there is 
considerable controversy about the long time behav- 
ior of such models (Ref. 6). Preliminary runs on the 
MPP indicate that our algorithm is correct. We are 
now conducting more extensive tests of the reliability 
of our technique for using the random number gener- 
ator. 


References 

1. Bhanot, G., Duke, D. and Salvador, R. 1986. A 
fast Algorithm for the CYBER 205 to Simulate 
the 3-D Ising Model. J. Statistical Physics, to ap- 
pear. 

2. Huse, D. A. 1986. Late Stages of Spinodal 
Decomposition: Corrections to Lifshitz-Slyozov 
Scaling and Monte Carlo Simulations, preprint. 

3. Marsaglia, G. and Kahaner, D. Generation 
of Uniform and Normal Random Variables. 
preprint. 


55 


